Classifying Political Orientation on Twitter: It's Not Easy!
نویسندگان
چکیده
Numerous papers have reported great success at inferring the political orientation of Twitter users. This paper has some unfortunate news to deliver: while past work has been sound and often methodologically novel, we have discovered that reported accuracies have been systemically overoptimistic due to the way in which validation datasets have been collected, reporting accuracy levels nearly 30% higher than can be expected in populations of general Twitter users. Using careful and novel data collection and annotation techniques, we collected three different sets of Twitter users, each characterizing a different degree of political engagement on Twitter — from politicians (highly politically vocal) to “normal” users (those who rarely discuss politics). Applying standard techniques for inferring political orientation, we show that methods which previously reported greater than 90% inference accuracy, actually achieve barely 65% accuracy on normal users. We also show that classifiers cannot be used to classify users outside the narrow range of political orientation on which they were trained. While a sobering finding, our results quantify and call attention to overlooked problems in the latent attribute inference literature that, no doubt, extend beyond political orientation inference: the way in which datasets are assembled and the transferability of classifiers.
منابع مشابه
Detecting Latent User Properties in Social Media
The ability to identify user attributes such as gender, age, regional origin, and political orientation solely from user language in social media such as Twitter or similar highly informal content has important applications in advertising, personalization, and recommendation. This paper includes a novel investigation of stacked-SVM-based classification algorithms over a rich set of original fea...
متن کاملNon-lexical Features Encode Political Affiliation on Twitter
Previous work on classifying Twitter users’ political alignment has mainly focused on lexical and social network features. This study provides evidence that political affiliation is also reflected in features which have been previously overlooked: users’ discourse patterns (proportion of Tweets that are retweets or replies) and their rate of use of capitalization and punctuation. We find robust...
متن کاملThe Impact of Twitter on Lawmakers’ Decisions
Organizations have used social media extensively to engage customers, but little is known if such engagement truly influences organizations’ decisions to make them closer to their customers. This paper studies this question in a unique context – the impact of U.S. Representatives’ Twitter engagement on their voting behavior in The Congress. In particular, we consider whether the adoption and fr...
متن کاملTwitter Demographic Classification Using Deep Multi-modal Multi-task Learning
Twitter should be an ideal place to get a fresh read on how different issues are playing with the public, one that’s potentially more reflective of democracy in this new media age than traditional polls. Pollsters typically ask people a fixed set of questions, while in social media people use their own voices to speak about whatever is on their minds. However, the demographic distribution of us...
متن کاملFrom Obscurity to Prominence in Minutes: Political Speech and Real-Time Search
Recently, all major search engines introduced a new feature: real-time search results, embedded in the first page of organic search results. The content appearing in these results is pulled within minutes of its generation from the so-called “real-time Web” such as Twitter, blogs, and news websites. In this paper, we argue that in the context of political speech, this feature provides dispropor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013